Goto

Collaborating Authors

 feedback gain



Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games

Sebastián, Eduardo, Keskar, Maitrayee, Iqbal, Eeman, Montijano, Eduardo, Sagüés, Carlos, Atanasov, Nikolay

arXiv.org Artificial Intelligence

Abstract-- Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The training data is stochastic but the network itself seems deterministic. Rebuttal ======== Thank you for the additional information. Q2 LIGHT REVIEW The paper shows promising results but should be restructured.


Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Non-Penalty Approach

Feng, Lechen, Li, Xun, Ni, Yuan-Hua

arXiv.org Artificial Intelligence

In [1], the distributed linear-quadratic problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ) are formulated into a nonsmooth and nonconvex optimization problem with affine constraints. Moreover, a penalty approach is considered in [1], and the PALM (proximal alternating linearized minimization) algorithm is studied with convergence and complexity analysis. In this paper, we aim to address the inherent drawbacks of the penalty approach, such as the challenge of tuning the penalty parameter and the risk of introducing spurious stationary points. Specifically, we first reformulate the SF-LQ problem and the DFT-LQ problem from an epi-composition function perspective, aiming to solve constrained problem directly. Then, from a theoretical viewpoint, we revisit the alternating direction method of multipliers (ADMM) and establish its convergence to the set of cluster points under certain assumptions. When these assumptions do not hold, we show that alternative approaches combining subgradient descent with Difference-of-Convex relaxation methods can be effectively utilized. In summary, our results enable the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates, restrictive structural assumptions or penalty formulations that incorporate constraints into the cost function.


Disturbance Estimation of Legged Robots: Predefined Convergence via Dynamic Gains

Li, Bolin, Cai, Peiyuan, Zuo, Gewei, Zhu, Lijun, Ding, Han

arXiv.org Artificial Intelligence

In this study, we address the challenge of disturbance estimation in legged robots by introducing a novel continuous-time online feedback-based disturbance observer that leverages measurable variables. The distinct feature of our observer is the integration of dynamic gains and comparison functions, which guarantees predefined convergence of the disturbance estimation error, including ultimately uniformly bounded, asymptotic, and exponential convergence, among various types. The properties of dynamic gains and the sufficient conditions for comparison functions are detailed to guide engineers in designing desired convergence behaviors. Notably, the observer functions effectively without the need for upper bound information of the disturbance or its derivative, enhancing its engineering applicability. An experimental example corroborates the theoretical advancements achieved.


Robustified Time-optimal Point-to-point Motion Planning and Control under Uncertainty

Zhang, Shuhao, Swevers, Jan

arXiv.org Artificial Intelligence

This paper proposes a novel approach to formulate time-optimal point-to-point motion planning and control under uncertainty. The approach defines a robustified two-stage Optimal Control Problem (OCP), in which stage 1, with a fixed time grid, is seamlessly stitched with stage 2, which features a variable time grid. Stage 1 optimizes not only the nominal trajectory, but also feedback gains and corresponding state covariances, which robustify constraints in both stages. The outcome is a minimized uncertainty in stage 1 and a minimized total motion time for stage 2, both contributing to the time optimality and safety of the total motion. A timely replanning strategy is employed to handle changes in constraints and maintain feasibility, while a tailored iterative algorithm is proposed for efficient, real-time OCP execution.


Haptic feedback of front car motion can improve driving control

Cheng, Xiaoxiao, Geng, Xianzhe, Huang, Yanpei, Burdet, Etienne

arXiv.org Artificial Intelligence

This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The results of an experiment with 15 subjects indicate that haptic feedback from the front car's motion can significantly improve driving control (i.e., reduce motion jerk and deviation from the road) and reduce mental load (evaluated via questionnaire). This suggests that haptic communication, as observed between physically interacting humans, can be leveraged to improve safety and efficiency in automated driving systems, warranting further testing in real driving scenarios.


Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot

Guan, Neil, Yu, Shangqun, Zhu, Shifan, Kim, Donghyun

arXiv.org Artificial Intelligence

Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-based impedance matching between simulated and real robots. Our framework offers a structured guideline for parameter selection and the range for dynamics randomization in simulation, thus facilitating a safe sim-to-real transfer. The learned policy using our framework enabled jumps across distances of 55 cm and heights of 38 cm. The results are, to the best of our knowledge, one of the highest and longest running jumps demonstrated by an RL-based control policy in a real quadruped robot. Note that the achieved jumping height is approximately 85% of that obtained from a state-of-the-art trajectory optimization method, which can be seen as the physical limit for the given robot hardware. In addition, our control policy accomplished stable walking at speeds up to 2 m/s in the forward and backward directions, and 1 m/s in the sideway direction.


Task-Space Riccati Feedback based Whole Body Control for Underactuated Legged Locomotion

Yang, Shunpeng, Hong, Zejun, Li, Sen, Wensing, Patrick, Zhang, Wei, Chen, Hua

arXiv.org Artificial Intelligence

This manuscript primarily aims to enhance the performance of whole-body controllers(WBC) for underactuated legged locomotion. We introduce a systematic parameter design mechanism for the floating-base feedback control within the WBC. The proposed approach involves utilizing the linearized model of unactuated dynamics to formulate a Linear Quadratic Regulator(LQR) and solving a Riccati gain while accounting for potential physical constraints through a second-order approximation of the log-barrier function. And then the user-tuned feedback gain for the floating base task is replaced by a new one constructed from the solved Riccati gain. Extensive simulations conducted in MuJoCo with a point bipedal robot, as well as real-world experiments performed on a quadruped robot, demonstrate the effectiveness of the proposed method. In the different bipedal locomotion tasks, compared with the user-tuned method, the proposed approach is at least 12% better and up to 50% better at linear velocity tracking, and at least 7% better and up to 47% better at angular velocity tracking. In the quadruped experiment, linear velocity tracking is improved by at least 3% and angular velocity tracking is improved by at least 23% using the proposed method.


Toward Adaptive Cooperation: Model-Based Shared Control Using LQ-Differential Games

Varga, Balint

arXiv.org Artificial Intelligence

This paper introduces a novel model-based adaptive shared control to allow for the identification and design challenge for shared-control systems, in which humans and automation share control tasks. The main challenge is the adaptive behavior of the human in such shared control interactions. Consequently, merely identifying human behavior without considering automation is insufficient and often leads to inadequate automation design. Therefore, this paper proposes a novel solution involving online identification of the human and the adaptation of shared control using Linear-Quadratic differential games. The effectiveness of the proposed online adaptation is analyzed in simulations and compared with a non-adaptive shared control from the state of the art. Finally, the proposed approach is tested through human-in-the-loop experiments, highlighting its suitability for real-time applications.